Skip to content

[DRAFT] Vllm fused moe#3687

Draft
NuojCheng wants to merge 4 commits intomainfrom
chengnuojin-vllm-rule
Draft

[DRAFT] Vllm fused moe#3687
NuojCheng wants to merge 4 commits intomainfrom
chengnuojin-vllm-rule

Conversation

@NuojCheng
Copy link
Copy Markdown
Collaborator

@NuojCheng NuojCheng commented Apr 16, 2026

Description

Example command

NEW_MODEL_DESIGN=1 python3 -m maxtext.inference.vllm_decode src/maxtext/configs/base.yml     model_name=qwen3-30b-a3b     tokenizer_path=Qwen/Qwen3-30B-A3B     vllm_hf_overrides='{architectures: ["MaxTextForCausalLM"]}' load_parameters_path=gs://parambole-qwen3-moe-verification/unscanned/qwen3-30b-a3b-thinking-2507/14_08_2025/0/items     ici_expert_parallelism=4     hbm_utilization_vllm=0.5     prompt="Tell me three fun facts about Buenos Aires."     decode_sampling_temperature=0.0     decode_sampling_nucleus_p=1.0     decode_sampling_top_k=0.0     pure_nnx_decoder=True     use_chat_template=True     sparse_matmul=True     prefuse_moe_weights=True     debug_sharding=True 2>&1 | tee  qwen3_30b_vllm_ep.log

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@NuojCheng NuojCheng force-pushed the chengnuojin-vllm-rule branch from bda614a to 4d486c4 Compare April 17, 2026 17:49
@NuojCheng NuojCheng added the draft Draft PR label Apr 17, 2026
@NuojCheng NuojCheng force-pushed the chengnuojin-vllm-rule branch from 8154d78 to aa840a7 Compare April 17, 2026 21:55
@NuojCheng NuojCheng force-pushed the chengnuojin-vllm-rule branch from aa840a7 to 0f4beb8 Compare April 17, 2026 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

draft Draft PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants